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Abstract 

Background: Circulating tumour cells (CTC) are receiving increasing attention as prognostic, predictive and 
pharmacodynamic biomarkers in cancer patients. However, their clinical significance can be dependent on an 
accurate determination of CTC around cut-off values at low cell counts (<10 cells/7.5 ml). Consequently, we have 
conducted method validation of the CellSearch™ system focusing on clinical samples containing CTC in the cut-off 
region. 

Methods: Analytical accuracy was first assessed employing quality controls (QC) and spiked healthy volunteer 
blood specimens. Results were analysed by (3-expectation tolerance intervals (BETI). Inter-operator error (6 different 
readers) was then characterised in 38 different patient samples, 68% of which had <5 CTC and data were analysed 
by (3-content y-confidence tolerance intervals (BCTI). 

Results: Results from QCs and spiked blood confirmed a 3-4-fold higher degree of imprecision at the low (48 cells, 
BETI = + 0.288/-0.345, (3 = 95%) compared to the high QC (987 cells, BETI = +0.065/-0.140, (3 = 95%). However, when 
data for individual analysts were interrogated characteristic systematic errors were detected. In the analysis of 
patient samples again individual analysts introduced a highly specific error into the interpretation of CTC images, 
which correlated to the level of training and experience. When readers were selected based on BETI and BCTI 
results, the high level of between-operator error (up to 170%) observed at CTC of < 5 was reduced to < 30%. 

Conclusions: Inter-operator variability in enumeration of CTC at low cell counts can be considerable, but is also 
potentially avoidable by following simple guidance steps. 

Keywords: Circulating tumour cells, Predictive biomarker, Enumeration, Low cell counts, Method validation, 
Statistical analysis 



Background 

Detection, enumeration and characterisation of circulating 
tumour cells (CTC) as a potential biomarker currently 
represents one of the most actively pursued areas in trans- 
lational cancer research [1]. CTC are believed to act as the 
seeds' for the establishment of metastatic disease, and also 
a mechanism to re-populate the primary tumour, and 
their presence has been shown to correlate to both pro- 
gression free survival and overall survival [2-5]. In these 
studies, a discrete cut-off point was discriminated at ex- 
tremely low numbers of CTC above which poorer 
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prognosis was observed: > 5 in 7.5 ml blood for metastatic 
breast cancer, metastatic castration resistant prostate can- 
cer and non-small cell lung cancer; > 3 for metastatic 
colorectal cancer; > 2 in melanoma and > 1 in neuroendo- 
crine tumours [2-8]. In addition, CTC have been shown to 
be predictive of response to both chemotherapy and 
targeted agents in post-treatment samples and to act as a 
pharmacodynamic biomarker [2,9-12], 

Isolation of rare cells (1 CTC in 10 8 leucocytes) presents 
considerable technological challenges requiring a robust 
analytical technique and a number of different approaches 
have been developed based on the physical and biological 
properties of CTC [1,13]. To date, the CellSearch™ system 
(Veridex, Raritan, NJ, USA) remains the only platform 
that is cleared by a regulatory authority (the FDA in the 
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USA), as an aid in the monitoring of patients with meta- 
static breast, colorectal and prostate cancer http://www. 
accessdataida.gov/cdrh_docs/pdf7/K073338.pdf. The sys- 
tem consists of two major instrumental components: the 
AutoPrep station for the fully-automated immunomagnetic 
isolation of cells from whole blood and an Analyser for the 
semi-automated identification of CTC based on 4-channel 
fluorescence microscopy [14]. Although, the subject of 
method validation studies in the past, these have tended to 
focus on either precision of quality control samples or be- 
tween laboratory concordance (reproducibility) [15,16]. 
Only limited studies have been performed evaluating the 
analytical accuracy of the technique - determination of the 
true value for CTC in the patient sample [17]. Nonetheless, 
CellSearch remains the benchmark against which all new 
technologies should be assessed [18]. 

Assays employed as a prognostic or predictive biomarker 
require a credible level of analytical validation [19,20] and 
in the case of CTC that should include a demonstration 
that the technique is accurate and reproducible at the cut- 
off level [18,21]. Therefore, in the present study method 
validation of the CellSearch system was conducted focusing 
on the analysis of patient samples containing low cell 
counts, in the region of the published clinically relevant 
cut-off points. To address the issue of analytical accuracy 
statistical approaches to the interpretation of data including 
p-expectation tolerance intervals (BETI) were employed 
[22]. In addition, a major goal was to achieve a reduction in 
inter-operator variability and this aspect utilised a modifica- 
tion of incurred sample reproducibility (ISR) and p-content 
y-confidence tolerance intervals (BCTI) [23]. 

Methods 

Patients and blood sample collection 

Blood samples (7.5 ml) for CTC enumeration were collected 
from a total of 38 different lung, prostate, melanoma and 
colorectal patients receiving standard of care chemotherapy 
at the Christie Hospital, Manchester and entered into a 
number of experimental medicine studies being conducted 
at the Paterson Institute for Cancer Research. Written in- 
formed consent was obtained from all subjects and the stud- 
ies were ethically approved by the Tameside and Glossop 
Research Ethics Committee (Manchester, UK) and the Dec- 
laration of Helsinki Principles was followed. Samples were 
harvested into CellSave tubes (Veridex, Raritan, NJ, USA), 
containing EDTA and a cellular preservative and maintained 
at room temperature for no longer than 72 hours prior to 
analysis. Blood was also collected from healthy volunteers 
for recovery experiments according to a local ethics com- 
mittee approved protocol. 

CTC enumeration by CellSearch 

CTC were essentially enumerated as previously described 
in detail [7,24,25] . In brief, blood was diluted, centrifuged 



and incubated with ferrofluid particles coated with anti- 
EpCAM antibodies utilising the CellTracks AutoPrep 
station (Veridex). After immunomagnetic enrichment, 
ferrofluid-captured cells were permeabilised and fluo- 
rescently labelled using phycoerythrin-conjugated anti- 
cytokeratin antibodies (pan-keratin antibody C-ll) to 
identify epithelial cells and allophycocyanin conjugated 
anti-CD45 antibody to identify and discount leucocytes. 
4-6-Diamidino-2-phenylindole (DAPI) was incorpo- 
rated to identify cell nuclei. Upon repeated magnetic sep- 
aration, the fluorescently labelled cells were oriented to 
the surface of the (MagNest™) cartridge for interrogation 
using the CellTracks Analyser II (Veridex). Image frames 
covering the entire surface of the cartridge were captured 
by the software and a gallery of objects meeting pre- 
determined criteria presented to the analyst to confirm or 
otherwise the presence of CTC. Image galleries were 
assessed by the operator without prior knowledge of pa- 
tient data. A CTC is defined as a nucleated cell staining 
positively for cytokeratin and negatively for CD45 and re- 
sults are reported as CTC number per 7.5 ml of blood. 

Experimental studies 

To establish the level of analytical accuracy achievable by 
the CellSearch system, the quality control (QC) reagents 
provided by Veridex were utilised. These are certified to 
contain a specified range of human tumour cells (SK-BR3 
cells), at a high and low cell count, and are integral to the 
quality control procedures of the system. The statistical 
evaluation of analytical accuracy employed p-expectation 
tolerance intervals (see below). QC data obtained over a 3 
month period during the analysis of 27 different batches 
of patient samples were collated for statistical evaluation. 
Variables investigated included the influence of two differ- 
ent CellSearch systems and 3 different operators. 

In a second statistical evaluation of analytical accuracy 
by BETI, healthy donor blood was spiked with a known 
number of human tumour cells according to the following 
protocol. Approximately 30 ml of normal donor blood was 
collected into a CellSave tube. Growing cultures of either 
SW620 or HI 048 cells were trypsinised and re-suspended 
in 1 ml phosphate buffered saline (PBS) and counted. Cells 
were re-suspended to a final concentration of either 3 or 
25 in 100 ul (i.e. 30 or 250/ml) in PBS. 100 ul of cell sus- 
pension was then added to 3 empty CTC isolation tubes 
followed by 7.5 ml of normal donor blood. A control sam- 
ple of 7.5 ml of normal donor blood with no spiked cells 
was also included. The number of cells spiked to each tube 
was unknown to the 6 different operators who then 
enumerated the cell numbers by CellSearch. 

The final experiment involved the interrogation by 6 
different operators of the same image galleries obtained 
from the analysis of 38 different patient samples. Ana- 
lysts were selected on the basis of varying levels of 
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training and experience. In the statistical evaluation of 
the resultant data a two-sided p-content y-confidence 
tolerance interval was employed (see below). 

Mathematical calculations and statistical analysis 

Calculation of BETI was preformed utilising MATLAB 
(Version R2009a, MathWorks, Natick, MA, United States) 
as described previously [26]. Tolerance intervals were cal- 
culated at (3 = 67%, 80% and 95%. A plot of BETI (y-axis) 
against the nominal concentration of the QCs or spike (x- 
axis) is referred to as the accuracy profile' and is used ex- 
tensively throughout this report. 

Evaluation of ISR utilised BCTI for statistical analysis of 
data. This yields an upper and lower interval where a speci- 
fied (p) proportion of measurements will lie with a specified 
(y) level of confidence and was calculated as previously 
reported [23]. In our adaptation of this methodology, where 
normally a single operator assays the same samples twice 
(or more), data from a pair of operators who assayed the 
same samples a single time were substituted into the calcu- 
lations in order to characterise the relative error introduced 
by each. Here, 

Yf = the original measurements (i.e. analyst 1). 

Yf = the repeat measurements (i.e. analyst 2). 

a ; = iog(rf)-iog(r?) 

i=l 

i=l 

Where, A t is the difference between original and repeat 
measurements in log transformed concentrations, N is 
the number of patient samples, A is the mean of the dif- 
ferences between the log transformed concentrations 
and d\ is the variance of the differences between the log 
transformed concentrations. The two tailed p-content 
y-confidence tolerance interval is therefore defined as: 

A ± Z (l+m Vl +AT 1 ^(N-l)al/xl_ ltl _ r 

Z(i + p)i2 is the upper (1 + /?)/2 quantile of the standard 
distribution and is the lower y quantile of the 

chi-squared distribution (N-l degrees of freedom). Cal- 
culation of BCTI was performed utilising MATLAB (as 
above) at (3 = 67% and 95% [26]. A plot of BCTI (y-axis) 
against the operator pair (x-axis) represents a modified 
form of the accuracy profile'. All code developed in 
MATLAB was validated against previously published 
data sets as reported previously [26]. 



Results 

Analytical accuracy of the CellSearch system characterised 
by |3-expectation tolerance intervals 

Figure 1A contains the accuracy profiles and BETI gener- 
ated from the QC data obtained over a 3 month period 
during the analysis of 27 different batches of patient sam- 
ples. In keeping with previously published data [15,27], the 
error associated with the analysis of CTC at lower numbers 
was 2-3 fold greater than at higher numbers. There 
appeared to be little evidence of bias (systematic error) at 
either high or low CTC numbers, where the tolerance in- 
tervals at p = 67, 80 and 95% were symmetrically centred 
about the mid-point of the certified range for the QCs. In 
absolute terms, the tolerance intervals rarely exceeded a 
margin of 30% (even at 95% probability), the recommended 
benchmark for a biomarker assay in the fit-for-purpose ap- 
proach to method validation [17]. 

Nonetheless, when the equivalent data were plotted for 
each operator involved in the analysis (see Figure 1B-D), 
striking differences in the resultant accuracy profiles 
emerged (P < 0.05, ANOVA). Analyst 1 introduced a posi- 
tive bias in the determination of the low QC which was sig- 
nificantly different from the performance of the other two 
analysts (Newman-Keuls multiple comparison test) while 
Analyst 2 introduced a negative bias in the low QC. A large 
degree of imprecision (random error) was evident in the 
low QC data attributed to Analyst 3 coupled to a small 
negative bias in the high QC. Identification of such 
discrete analytical errors allows for the possibility of their 
correction, demonstrating the potential power of the BETI 
approach to method validation. No significant differences 
in QC values were recorded when the results obtained 
from the two separate CellSearch systems were compared 
(Student's t test). 

Six different operators independently interrogated the 
image galleries produced by the analysis of volunteer blood 
samples spiked with low numbers of tumour cells (3 and 
25, n = 3). Here, BETI for the samples spiked with 25 cells 
was +0.562 and -0.546 at p = 95%, total error was 24.5% 
and average recovery 101% ± 24% coefficient of variation 
(CV), consistent with a large degree of random error but 
absence of systematic error and in keeping with previous 
studies [16]. Due to the relatively small number of speci- 
mens in this study, it was not possible to discriminate the 
individual contribution of each analyst towards the overall 
level of error. BETI for the samples spiked with 3 cells was 
+0.486 and -0.264 at p = 95%, total error was 28.3% and 
average recovery was 90% ± 9.6%. 

Incurred sample reproducibility of the CellSearch system 
characterised by |3-content y-confidence tolerance intervals 

The nature and extent of inter-operator error in CTC 
enumeration by CellSearch was investigated through 
ISR and applying this concept to different pairings of 
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Figure 1 Determination of analytical accuracy in CTC enumeration utilising BETI and QC samples. Certified QC samples containing SK-BR3 
human tumour cells spiked at high and low numbers were assayed by a pool of analysts over a 3 month period in order to construct tolerance 
intervals (±) at (3 = 95%, 80% and 67%. Combined tolerance intervals for all analysts (a) versus tolerance intervals for a single analyst: (b), (c) and (d). 



analysts (Figure 2A-D). Of all patient samples analysed 
and interrogated in this evaluation a total of 68% had 
nominal CTC values of <5. Figure 2 A contains the modi- 
fied' accuracy profiles and BCTI for a pairing of all 6 oper- 
ators, each of whom was selected on the basis of differing 
levels of training and experience. Analyst 1 was the bench- 
mark, the most highly trained and experienced member of 
the group. The level of error recorded between the differ- 
ent pairings was comparable and in certain cases (e.g. ana- 
lyst pair 1 and 4) less than that observed in the analysis of 
the QC samples (see Figure 1), a phenomenon previously 
noted in biomarker method validation of 2 cell death 
ELISA assays [26]. The magnitude of error between pairs 
of operators followed very closely their level of training 
and experience. Analysts 4 and 5 underwent professional 
training at the Veridex European Centre and were highly 
experienced whereas Analyst 6 had only recently received 
in-house training, while Analysts 2 and 3 were intermedi- 
ate in experience. 

When Analyst 2 was compared against the other opera- 
tors, there was a considerable increase in the tolerance inter- 
vals recorded (approximately 2-fold) and the introduction of 
a strong positive bias (Figure 2B). Likewise, in the case of 
Analyst 3, who was intermediate in experience, there was 



also a large increase in the level of random error but with- 
out any notable bias (Figure 2C). Analyst 6, the most in- 
experienced operator, was consistently associated with a 
much greater level of error than any other analyst (see 
Figures 2A-D). Apart from Analyst 1, the benchmark oper- 
ator, the two other experienced analysts - 4 and 5 - 
appeared to be able to function within or close to the 30% 
margin of error recommended as acceptable for biomarker 
assays [17] (Figure 2D). 

Amelioration of the inter-operator error associated with 
the enumeration of low CTC utilising BETI and BCTI 

Through the application of both certified QCs and 
patient samples, and employing the statistical proce- 
dures of BETI and BCTI in data interpretation, distinct 
performance characteristics associated with different 
analysts have been identified. Figure 3A displays the 
inter-operator error in CTC enumeration at low counts 
as CVs when data from all 6 analysts were included. 
The graph illustrates a profile typical of that obtained in 
previous analogous studies, where there is essentially an 
exponential increase in error as the cell count ap- 
proaches zero [16,24,27]. By selecting only analysts who 
through BETI and BCTI analysis have demonstrated 
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Figure 2 Characterisation of inter-operator variability in CTC enumeration at low cell counts utilising BCTI and patient samples. Image 


galleries generated from the analysis of different cancer patient blood samples, 


68% of which had <5 CTC, were enumerated by a pool of 


operators. Results were analysed by a modification of incurred sample reproducibility where the counts obtained by a pair of operators who 


interrogated the same samples were substituted into the calculations. Tolerance intervals (±) were constructed at (3 = 95% and 67% and the± 


30% error line is shown for reference, (a), Analyst 1 versus analysts 2-6; (b), analyst 2 versus analysts 3-6; (c) analyst 3 versus 4-6 and (d) analyst 


4 versus 5 and 6 and analyst 5 versus 6. 









consistency (analysts 1, 4 and 5) the high level of 
between-operator error observed at CTC of < 5 was sig- 
nificantly reduced and the CV in virtually all cases was 
less than 30% and in a majority of cases less than 20% 
(Figure 3B). 



Discussion 

The aim of the present study was to improve the accuracy 
of CTC enumeration by the CellSearch system at low cell 
counts (<5) utilising the statistical produces of BETI and 
BCTI. In a seminal paper published by scientists based at 
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Figure 3 Amelioration of inter-operator variability in CTC enumeration in patient samples at low cell counts. Between-operator error was 
calculated as the coefficient of variation (CV) in the mean cell count obtained by a panel of analysts enumerating CTC in 38 different cancer 
patient blood samples, (a) Inter-operator error for all 6 operators, (b) Reduced level of error obtained when only analysts 1, 4 and 5 (see Figure 2) 
who had demonstrated consistency were included. The 30% error line and 5 CTC line are shown for reference. 
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Veridex (then called Immunicon Corporation) a statistical 
model was developed to describe the main sources of 
error associated with the determination of CTC in human 
blood [24]. Three major error components were identified: 
a) sample collection, b) recovery of CTC through 
immunomagnetic depletion and c) inter-reader variability 
in the assignment of objects. A stimulus to conduct this 
study was based on the apparent arbitrary nature of the 
cut-off point of 5 CTC observed in metastatic breast can- 
cer patients [2,24]. No clear biologic basis could be prof- 
fered for such a discrete value [3,8,24,28]. 

To explain the uncertainties associated with sample 
collection in CTC enumeration Poisson statistics were 
incorporated into the error model [24]. Poisson statistics 
are believed to describe most accurately the effect of 
counting randomly distributed objects, such as CTC, in 
a discrete volume [28]. The effect of Poisson statistics on 
CTC enumeration is illustrated as follows. Where the 
true number of CTC is 5, the probability of detecting 5 
cells in a single sample collected from a patient is rela- 
tively small (17.5%). A feature of the Poisson distribution 
is that the variance is equal to the population mean. 
Thus, the theoretical CV for a set of measurements car- 
ried out on the same samples, based entirely on statis- 
tical probabilities, is 44.7% at 5 CTC. A number of 
previous validation studies, including the present work, 
have confirmed good agreement between the level of ex- 
perimental error measured at low CTC and the theoret- 
ical level of error dictated by Poisson statistics [16,27]. 

While it is difficult to control for the uncertainties in- 
troduced in the analysis of CTC by Poisson statistics, 
other than collecting a larger volume or many replicates 
of the same sample [24,28,29], the other two compo- 
nents in the error model are more amenable to correc- 
tion. In the case of sample recovery, this component has 
been demonstrated to exert only a modest effect [16,27], 
which was confirmed in the present study using spiked 
healthy volunteer blood samples. 

Where there is scope for improvement is in the area of 
inter-reader variability, which has been proposed as one 
of the main reasons to explain the arbitrary nature of the 
cut-off levels observed in clinical trials [3,24,28]. Indeed, 
it is possible that due to experimental error the true cut- 
off levels may be even lower than those previously 
reported [24]. Inter-operator error was also identified as 
the major contributor to between-laboratory variations 
observed during an external quality assurance assess- 
ment programme [21]. 

To apply BETI to method validation requires that the true 
or a certified value of the analyte of interest is known [30]. 
This limited our evaluation to data derived from the QC 
samples provided by the vendor (Veridex). BETI is normally 
associated with bioanalytical techniques, although there are 
limited examples of its application in cut-off interpretation 



of ELISA data [31]. To the best of our knowledge, the 
present report represents the first to apply this approach to 
method validation of the CellSearch system for CTC ana- 
lysis. The strength of this procedure is that it informs on 
analytical accuracy and the quality of result obtained in fu- 
ture measurements, at any operator defined level of prob- 
ability [30,32]. Results obtained with the QC samples made 
two significant observations. First, it confirmed that a high 
level of analytical accuracy was possible, with virtually no 
bias observed at the high QC coupled to a low level of im- 
precision. Here the total error was always less than 15%, 
well within the recommended level for a biomarker tech- 
nique [33,34]. Second, and most importantly, it identified 
that at lower CTC, characteristic systematic and random 
errors could be ascribed to individual analysts. 

However, it is also well recognised that QC samples often 
reflect poorly the analytical behaviour of clinical specimens, 
especially in the biomarker field [35,36]. In the case of the 
CellSearch system, the QCs comprised breast cancer cells 
reconstituted into a non-biologic matrix. Therefore, the 
main focus of the present paper revolved around the ana- 
lysis of cancer patient specimens containing low numbers 
of CTC (<5). To identify inter-reader errors the ISR meth- 
odology described by Hoffman was adopted utilising BCTI 
[23]. In our modification of this process we substituted a 
different operator to analyse the repeat sample. To control 
against sampling artefacts, a relatively large number of pa- 
tient specimens were analysed in a number of different 
assays [23]. Results obtained clearly showed that an individ- 
ual analyst could introduce a highly specific error into the 
interpretation of CTC images, analogous to the QC data. 
However, it was also demonstrated that with training and 
experience these errors could be significantly reduced. 

Increasingly, the regulators in the USA and Europe are 
placing more stringent requirements on the validation of 
biomarker assays [36-38]. Hence, validation data will be ad- 
judged in the future, not merely on the basis of technical 
performance characteristics with QCs [39], but in terms of 
the quality and significance of the data generated during 
the analysis of clinical specimens. As the gold standard, the 
most reliable biomarker data will be derived from multi- 
centre trials, where the analysis is subject to external profi- 
ciency testing schemes and inter-laboratory comparison 
programmes [18]. This is especially true when data gener- 
ated by the assay is intended to be used in the stratification 
of patients into different treatment arms, thus defining the 
biomarker as integral to the progress of the trial [19,20]. 
Here, the assay will have to be demonstrated to possess a 
proven ability to discriminate between different cohorts with 
a high degree of diagnostic sensitivity and specificity [37,38]. 
The first clinical study to conduct patient stratification based 
on a CTC cut-off value is SWOG S0500 (NCT00382018) in 
women with metastatic breast cancer receiving chemother- 
apy [18,40]. Other stratification trials employing CTC as a 
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biomarker are likely to follow, such as the 'CriTiCal Trial' 
(Circulating Tumour Cell guided Chemotherapy Trial in 
Colorectal Cancer) planned in the UK. 

Conclusions 

It has been shown that while inter-operator variability in 
enumeration of CTC at low cell counts can be considerable; 
it can be ameliorated by following simple guidance steps. 
First, and perhaps logically, operators should be trained to a 
high degree and experienced in the field. Second, utilising 
statistical based techniques, potential analysts should con- 
firm in a training set of images derived from 30-50 patient 
samples run in 5-10 different assays, that consistency can 
be achieved against benchmark analysts. Finally, CTC enu- 
meration in patient samples should be conducted on 2 or 
preferably 3 separate collections of blood, with a different 
operator analysing each, for two reasons. First by collecting 
up to 3 samples, one may attenuate the unavoidable un- 
certainties imposed on CTC enumeration by Poisson sta- 
tistics, increasing the probability of detecting > 1 (when 
present) up to 95% [24]. Second, by employing different 
analysts one can confirm or otherwise the lack of inter- 
operator variability. 
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